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Abstract. The proliferation of sensing devices create plethora of data- 
streams, which in turn can be harnessed to carry out sophisticated ana- 
lytics to support various real-time applications and services as well as 
long-term planning, e.g., in the context of intelligent cities or smart 
homes to name a few prominent ones. A mature cloud infrastructure 
brings such a vision closer to reality than ever before. However, we be- 
lieve that the ability for data-owners to flexibly and easily to control the 
granularity at which they share their data with other entities is very im- 
portant - in making data owners feel comfortable to share to start with, 
and also to leverage on such fine-grained control to realize different busi- 
ness models or logics. In this paper, we explore some basic operations to 
flexibly control the access on a data stream and propose a framework eX- 
ACML+ that extends OASIS's XACML model p] to achieve the same. 
We develop a prototype using the commercial StreamBase engine [2] to 
demonstrate a seamless combination of stream data processing with (a 
small but important selected set of) fine-grained access control mecha- 
nisms, and study the framework's efficacy based on experiments in cloud 
like environments. 



1 Introduction 

Wide-scale deployments of sensors and smart mobile devices, as well as emerging 
technologies and trends such as Internet of Things (IoT) and participatory sens- 
ing - all envision developing interesting real-time data stream driven applications 
not only based on the isolated data-streams generated by individual sources, but 
also by mashing them up together to generate more sophisticated services. Such 
services vary in scale and scope - from smart homes to intelligent cities 0, which 
may fuse together data all owned by a single owner or from many parties. The 
pervasive cloud infrastructure is an important catalyst in enabling such visions, 
because (i) it is cost effective [3] for individual data owners since it provides 
the necessary computational and storage resources elastically, (ii) it allows fast 
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prototyping, deployment and testing of new applications and analytics at large- 
scale, and equally importantly, (iii) by naturally achieving collocation of data 
from different sources that may be needed to build such complex applications, 
it reduces the barriers of sharing and collaboration. 

For example, a flu outbreak within one or several cities can be tracked, stud- 
ied (and necessary intervention measures taken) using real-time data from say 
hospitals, transport departments, weather stations, as well as telecom compa- 
nies. In the pre-cloud era, finding and sharing such data would encounter several 
layers of barriers. 

However, in order to facilitate meaningful sharing of data from data-owners 
to various users, it is crucial that the data-owners have adequate controls on 
what they wish to share and with whom. Such controls are desirable for several 
reasons, including maintaining ownership of data, privacy of content selectively, 
as well as monetization of the data by differentiated pricing and exposing data 
in different details to different users. In absence of fine-grained control, or if 
the control comes only at prohibitive costs (either in negotiating, interpreting 
or implementing), then it is more than likely that the data-owners would just 
decide to not share. In our previous work [4], we have proposed some simple 
extensions to XACML 1 to represent and enforce fine-grained access controls 
(such as time- window based aggregation, trigger /threshold based access, etc.) on 
archived data, i.e., data stored in a relational database. In this paper, we look 
at the relatively harder problem on how to achieve similar fine-grained access 
control on data-streams. 

Thus, while the sharing predicates we explore remain the same as in our 
previous work, the main challenges and contributions of the current work is 
precisely to deal with data-streams, which rely on fundamentally different set 
of technologies than data stored in a RDBMS. Developing access control model 
and mechanism on data streams is more challenging due to the characteristics 
of stream data and Data Stream Management System (DSMS). 

- The data model used in DSMS are fundamentally different from those used 
in traditional RDBMS. DSMS deals with unbounded and fast changing time- 
series tuples in data streams. Access control enforcement, particularly when it is 
based on the content (such as, say a value based trigger or range predicate) is not 
a one-time operation, but a continuous procedure applied on the data streams. 
Whenever a new data tuple arrives, corresponding access control actions must 
be taken on it. Therefore, models and technologies developed for access control 
enforcement on RDBMS such as those in [5] cannot be readily adapted for DSMS. 

- Temporal constraints, i.e., sliding windows, plays a crucial role in DSMS. In 
addition to normal constraints such as selection and projection, window-based 
aggregate operations into the access control model also needs to be considered. 

In this work we propose the eXACML+ framework, by extending our previ- 
ous work on fine-grained access control in RDBMS, namely eXACML 0], which 
in itself extends the popular XACML Q] standard. The eXACML+ framework 
adds fine-grained access control to a popular data stream management model, 
namely the Aurora model[6]. The main contributions of this paper are: 



1. We extend XACML to enable fine-grained access control for continuous 
queries [7J, i.e., standing queries which are continuously processing new incoming 
data streams. We express the fine-grain access control policies within obligations 
blocks of XACML policies and make the Policy Enforcement Point (PEP) gen- 
erate corresponding continuous queries from the obligations provides the Policy 
Decision Point (PDP) grant the access0 These continuous queries are expressed 
as query graphs and are sent to back-end DSMS for processing. We refer to this 
approach as XACML+. 

2. We integrate the various components in the eXACML+ framework. The 
framework consists of entities including data server, XACML+ instances, proxy 
server and client interface. Users send requests for data streams together with 
customized continuous queries and obtain stream handles, which point to the 
unique resource identifiers (URIs) of the processed data streams from the frame- 
work. 

3. We show that customized queries issued by users, if not taken care of care- 
fully, can give rise to information leak in case of sliding window policies (which 
our framework can detect and prevent). We also discuss possible improvements 
to the system efficiency by informing users of empty/partial results due to policy 
and query mismatches. 

4. We instantiate the eXACML+ framework using Aurora's commercialized 
software StreamBase [5] as the back-end DSMS. We evaluate the performance of 
our prototype in a cloud-like environment. The results indicate that the frame- 
work incurs relatively constant overhead and is scalable. 

The current work assumes a trusted cloud service provider, which itself has 
access to all the stored data, and furthermore honestly enforces the sharing con- 
straints specified by the data-owners. Content confidentiality from the cloud ser- 
vice provider itself, and the service provider's accountability are also important 
and much researched topics, which are necessary extensions for the presented 
work but are out of the scope of this paper. 

The rest of the paper is organized as follows. Section [2] describes details 
of XACML+,i.e., our extension to XACML to support stream data. Section [3] 
discusses the design of the eXACML+ framework. The prototype and evaluation 
of the framework are presented in Section [4] Section [5] discusses related work in 
the relevant research area and finally we conclude our work and propose future 
work in Section [5] 

2 XACML policies for Stream Data 
2.1 Overview of XACML and Aurora Model 

The extensible Access Control Markup Language [d is a OASIS framework for 
specifying and enforcing access control. Policies are written in XML and contain 
elements including subjects, resources , actions, obligations, etc.. The framework 

2 PEP and PDP are standard terms associated with XACML technology stack, and 
will be described in detail in next Section. 



consists two main components: a Policy Decision Point (PDP) and a Policy En- 
forcement Point (PEP). The former manages policies and evaluate user requests 
against the stored policies, the result of which are permit or deny decisions. 
PEP's main role is to marshall user requests and the PDP results. In addition 
to permit/deny decision, the PDP also returns a set of obligations to the PEP. 
We extend this process for fine-grained access control by embedding parts of the 
policies in obligations which are then processed by the PEP. 

Aurora [6] is a popular model for stream data, which has matured into a 
commercial product, i.e. the StreamBase engine[5]. In this model, a data stream 
consists of an append-only sequence of tuples with the same schema. A query 
on a data stream is modelled as a directed acyclic graph,which we refer as query 
graph, of operators (also called boxes) . The query graph is applied to each tuple 
from the stream such each tuple in the output data stream satisfies all predicates 
in the query graph. StreamBase also comes with support for StreamSQL, which 
is SQL-like representation of query graphs. 

The Aurora model supports a number of operators (or boxes), but in this 
work we focus on three common ones: filter (selection), map (projection) and 
window based aggregation (aggregate functions applied on sliding windows). A 
filter operator has a condition C — a boolean expression composed of logic 
operators (AND, OR, NOT), equality and inequality operators (<,>,<,>,= 
,=/=). A map operator contains a set of projected attributes S. A window-based 
aggregation operator consists of the sliding window (specified by the window 
type, size, advance step), the set of attributes and the aggregate functions to be 
computed over each window. 

2.2 Fine-grained Access Control Policies 

To better illustrate our access control model for stream data, we use the following 
example throughout the paper. 

Example 1. The National Environmental Agency (NEA) wishes to provide real- 
time weather data service through the Cloud platform. The data has the schema 
(samplingtime, temperature, humidity, solar radiation, rain rate, wind speed, 
wind direction, barometer) and is generated every thirty seconds by a weather 
station. Instead of creating one customised data stream for each individual cus- 
tomer, NEA decides to use the cloud's access control mechanism. Benefits of 
this approach has been discussed in [4]. The Land Transport Authority (LTA) 
is developing an automatic warning system that alerts drivers of possible traffic 
congestions due to heavy rain. The warning system requires real-time weather 
data from NEA which specifies the following policy: 1) only samplingtime, rain 
rate, and wind speed data are visible 2) data should come in windows of size 
5 and advance step of size 2, and the functions applied on samplingtime, rain 
rate and wind speed are lastValue, average and maximum, respectively 3) data 
is visible only when the rain rate is greater than 5mm/hour. 

Figure Q] shows the Aurora query graph that transforms the original weather 
data stream so that it satisfy the above access control scenario. Using the 
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Fig. 1. Aurora query graph for the example in section [2721 
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Table 1. Obligation types 



obligation-based approach described in [4], we create new obligation elements, 
each for every operator (listed in table Q]). The detailed information of each 
obligation element is as follows: 

1. Filter: consists of a string attribute with attribute ID: exacml: obligation: stream- 
filter- condition-id. The value is a string representing a boolean expression used 

as the filter condition C. 

2. Map: consists of a set of string attributes with ID: exacml: obligation: stream- 
map- attribute-id. The values are attribute names, used used to restrict access to 
only authorized attributes, such as rain rate and wind speed in the example. 

3. Window-Based Aggregation: consists a number of attributes: 

- Window type: string attribute with attribute id: exacml: obligation: stream- 
window-type-id. It specifies if the window size is based on number of tuples or 
number of time unit. 

- Window size: integer attribute with attribute id: exacml: obligation: stream- 
window- size-id. It specifies the size of the window in the number of tuples or 
time units. 

- Window advance step: integer attribute with attribute id: exacml: obligation: stream- 
window- step-id. It specifies how fast the window advances on the stream. 

- Aggregation attribute: string attribute with attribute id: exacml: obligation: stream- 
window- attr-id. It specifies the attribute in the data stream schema that is to 

be aggregated, and what the aggregate function is. Its value is of the form 
attribute-id:aggregate-function, where attribute-id is the name of the attribute 
and aggregate-junction is an element from the set of aggregate functions {Avg, 
Max, Min, Count, LastValue, FirstValue,...}. 



Once we have defined the individual obligation elements, we can combine 
them to form the obligations block of the XACML policy for the access scenario. 
Figure [5] shows how these obligation elements are used in the policy. 



<Dbligations> 

<Obligation Obligationld="exacml : obligation : stream-filter" Fulf illOn= "Permit "> 
< Attribute Assignment AttributeId="pCloud : obligation: stream-filter-condition- id" 
DataType="http : //www . w3 . org/2001/XMLSchema#string">rainrate > 5 </AttributeAssignment> 

</Obligation> 

<Obligation DbligationId="exacml : obligation : stream-map" Fulf illOn=" Permit "> 

< Attribute Assignment Attribute Id="pCloud: obligation : stream-map-attribute-id" 
DataType="http : //www . w3 . or g/2001/XMLSchema#string">samplingtime</ At tribute Ass ignment> 
< Attribute Assignment Attribute Id="pCloud: obligation : stream-map-attribute-id" 
DataType="http : //www . w3 . org/2001 /XMLSchema#string" >r ainrate</ Attribute Assignment > 
< Attribute Assignment Attribute Id="pCloud: obligation : stream-map-attribute-id" 
DataType="http : //www. w3 . org/200 l/XMLSchema#string" >windspeed</ Attribute Ass ignment> 

</Obligation> 

<Obligation Obligationld="exacml : obligation : stream-window" Fulf illOn=" Permit "> 
< Attribute Assignment AttributeId="pCloud : obligation: stream-window-step-id" 
DataType="http : //www . w3 . org/2001 /XMLSchema# integer ">2</ At tribute Ass ignment> 
< Attribute Assignment AttributeId="pCloud : obligation: stream-window-size-id" 
DataType="http : //www . w3 . org/2001 /XMLSchema# integer ">5</ At tribute Ass ignment> 
< Attribute Assignment AttributeId="pCloud : obligation: stream-window-type-id" 
DataType="http : //www . w3 . org/2001 /XMLSchema#string" >tuple</ Attribute Assignment > 
< Attribute Assignment AttributeId="pCloud : obligation: stream-window-attr-id" 

DataType="http : //www . w3 . org/2001 /XMLSchema#string">samplingtime : last val</ Attribute Ass ignment> 
< Attribute Assignment AttributeId="pCloud : obligation: stream-window-attr-id" 
DataType="http : //www . w3 . org/2001 /XMLSchema#string">rainr ate : avg</ Attribute Assignment > 
< Attribute Assignment AttributeId="pCloud : obligation: stream-window-attr-id" 
DataType="http : //www . w3 . or g/2001/XMLSchema#string">windspeed:max</ At tribute Assignment > 
</Obligation> 
</Obligations> 



Fig. 2. Obligation portion of the XACML policy for the example in section \T7I\ 



3 The eXACML+ Framework 



We describe in this section the design of the eXACML+ framework which is a 
natural extension of the eXACML with additional functionality to manage Au- 
rora query graphs and customised queries issued by users. The architecture of 
eXACML+ resembles that of eXACML except that: 1) New XACML + instances 
are added into the framework to handle access control needs on data streams and 
2)Update on the policy management module in response to the change of data 
model. Figure 3(a) illustrates the architecture of eXACML-h It includes enti- 
ties such as cloud server, XACML+ and XACML* instances, proxy with cache 
feature and client interface. We will also include discussion on two important 
issues which have not been discussed in the original eXACML framework, which 
are 1) why allowing multiple windows on the same data stream could cause vi- 
olation of data privacy and 2) how do we alert users if their queries contradict 
with the access control policies enforced on the data streams and could cause 
empty /partial result sets eventually. 




(a) eXACML+ (b) XACML+ 



Fig. 3. Architecture of XACML+ and eXACML+ framework 
3.1 Handling User Query 

In many cases, the data stream accessible by the user may not directly fit the 
actual requirement. In our example, suppose that the LTA later finds out that 
only rain rate over 50mm/hr has influence on traffic condition, plus the warning 
system only need the data from sliding windows of size 10 (instead of the original 
5 tuples). The LTA could process the incoming data steam locally. However, hav- 
ing such additional filtering done by a server (the cloud) is more preferable. In 
our framework, the user sends a customised query to the PEP. The query acts as 
a request to apply additional operation on the authorized stream. We implement 
the query in XML form, as shown in Figure 2{a) . The PEP transforms this into 
an Aurora query graph similar to that in Figure [TJ and then combines it with 
the query graph derived from the policy obligations. One could simply concate- 
nate the two graphs, but properly merging them together gains advantages such 
as reducing the number of operators in query graph and therefore improving 
efficiency. It also allows for detection of empty / partial result (which we refer to 
as NR and PR). Merging two query graphs is equivalent to merging each type 
of operators in the graphs. We explain how NR and PR can be detected during 
the merging process later in Section [3.51 The rules for merging individual types 
of operators are: 

- Two filter operators F\ and F2 with condition C\ and C2 are merged into 
a filter F3 with the condition C3 = (Ci) AND (C2). There are cases that C3 can 
be further simplified. For example, if Ci = x > v\ and C2 = x > V2 , C3 can be 
written as x > V2 iff «2 > Vi ■ 

- Two map operator M\ and M 2 with attribute sets S\ and S2 are merged 
into new operator M3 with the attribute set S3 = Si U £2- 

.- Two window-based aggregation operator A\ and A2 are merged only if 
the following conditions are met: 1) window types are the same 2) suppose A\ 
is derived the policy obligations and A2 from user query, Ai's window size and 
advance step are must be less than or equal to those of A2. The second condition 



CREATE INPUT STREAM weather ( 
samplingtime timestamp , temperature double , 
humidity double , rainrate double , 
windspeed double, winddirection int , 
barometer double ) ; 

CREATE STREAM internal_0; 

SELECT * FROM weather WHERE rainrate > 50 INTO internal_0; 
CREATE OUTPUT STREAM internal_l; 

SELECT internal_0 . samplingtime , internal_0 . rainrate , 
FROM internal_0 INTO internal_l; 

CREATE OUTPUT STREAM output; 

CREATE WINDOW _10tuple( SIZE 10 ADVANCE 2 TUPLES); 
SELECT lastval (samplingtime) AS lastvalsamplingtime , 
avg(rainrate) AS avgrainrate 
FROM internal_l [_10tuple] INTO output; 

(a) User Query in XML 0) StreamSQL statements 

Fig. 4. User Query and StreamSQL 



is to ensure that user are not given more fine grained data than permitted by the 
policy. The new operator A3 will have the same window type as A\ and A2, and 
the window size and advance step are the same as those of A^. The aggregation 
function and attribute sets are the intersection of those from A\ and A^. 

Figure HJb) shows the StreamSQL statements after merging the query graph 
in Figure [T] with user query in Figure HJa) ■ 



3.2 Design of XACML+ 

Figure [3(b) | shows the design of XACML+, which is an extension of the original 
oasis XACML model [T]. The work- flow is as follows: 

1. PEP receives a user's request for accessing a stream, together with a 
customized query, which are then forwarded to the PDP. The customised query 
is also converted into a Aurora query graph. 

2. PDP evaluates the request against the stream's policies and returns the 
decision and obligations (if any) to the PEP. If the decision is Permit, PEP will 
generate a query graph from the obligations. 

3. PEP checks that for the credentials included the request, no query is 
currently being applied to the same data stream. The reason for this is given in 
Section |3H 

4. PEP merges the two query graphs derived from obligations and user query, 
during which PR or NR are checked. 

5. If there is no PR or NR warning detected, the merged query graph is 
converted into a StreamSQL script and sent to the data stream engine. A handle, 
in the URI form, is returned to the user. 



<UserQuery> 

<Stream name="weather" /> 
<Filter> 

<FilterCondition> 

RainRate > 50 

</FilterCondition> 
</Filter> 
<Map> 

<Attribute>RainRate</Attribute> 
</Map> 

<Aggregation> 

<WindowType>tuple</WindowType> 

<WindowSize>10<WindowSize> 

<WindowStep>2<WindowStep> 

<Attribute>avg (RainRate) </Attribute> 
</Aggregation> 
</UserQuery> 



3.3 Query Graph Management 

In the original eXACML framework that handles only bounded data, PDP is 
always called whenever a data request is received and only when the decision 
is 'Permit', SQL queries are generated from obligations and sent to database 
to retrieve data. This work flow guarantees that removing or updating a policy 
does not affect the privacy of the data owner. However, this is not the case when 
dealing with unbounded stream data. Instead of actual data, only a handle used 
to retrieve the actual data stream from data stream engine is returned as the 
response to user's request. The user then use this handle to connect to the back- 
end data stream engine for data streams. If the data stream owner for some 
reason has removed or modified the policy that grants the user for a particular 
data stream, the user may still connected to the data stream though he is not 
supposed to be able to access the data stream any longer. 

To solve this issue, we need to in-cooperate query graph management into 
the framework. In additional to keeping track of policies loaded, data server also 
keeps track of query graphs that are generated by PEP and have already been 
sent to back-end data stream engines. In current eXACML+ design, whenever a 
policy has been removed or modified by user, all query graphs that are spawned 
by the policy are immediately withdrawn from back-end data stream engines. 
This may not be a flexible solution, but it ensures data privacy and is not easy 
to be compromised. 

3.4 Multiple Aggregation Windows 

As described in Section 13.21 only a single access is permitted on a particular 
data stream for one user at any time. We justify this constraint by showing a 
example in which one can reconstruct the raw data steam by combining outputs 
from multiple aggregation windows of different window sizes or advance steps. 

Example 2. Suppose we have a single-attribute stream S = 00,01,02,03 ,04,05..., 
a n . The access control policy for S allows for aggregation window w, where 
w.size — 3, w.advancestep — 2, w.type — tuple, w.attribute=a and w. function 
=sum. An user can request an aggregation window v using a customised query, 
provided that v.size>—w.size,v.advancestep >= w.advancestep, v. type = w.type, 
v. attribute = w. attribute and v. function = w. function. If multiple accesses are al- 
lowed, the user can obtain multiple result stream using different v simultaneously. 
Let V\.size — 3, v 2 .size — 4, v^.size — 5 and all other window specifications be 
identical to w, the framework will return the user three aggregated data streams 
Si, 5*2 and S3, such that: 

51 = (00+01+02), (02+03+04), (a 4 +a 5 +a 6 ), ... 

52 = (00+01+02+03), (02+03+04+05), (04+05+06+07),... 

53 = (00+01+02+03+04), (02+03+04+05+06), ... 

By computing S2 - Si and S3 - S2 , we obtain two new streams S = 03,05,07,... 
and S =04,06,08,... Merging S and S , we can reconstruct all of the original 
stream except for first three tuples. 



In general, given a set of result streams derived from different aggregation 
windows with fixed advance step M and different window size N, N+l, N+2,.., 
N+M, we can reconstruct the original stream from the Nth tuple. The inductive 
proof is as following: 

Suppose we have three sum aggregation windows with sizes N, N + Qi, 
N + Q2, where Qi < Q2, and a fix step size M, the first k tuples of the 
three streams are: S = (a + ... + a N _i),(a M + ••• +^N+M-i),---,(akM + ••• 

+ a N+kM-l)] S\ =(a + ... + a,N+Q 1 -l),( a M + ••• + a 7V+M+Qi-l)v:( a fcAf + ••• 
+ aA r +feM+Qi-l);5 , 2 =(O0 + ••• + a N+Q 2 -l)i{ a M + ••• +0/V+M+Q 2 -l) v:( a feM + 

... +ajv+feM+Q 2 -i)- Let T\ = Si - So and T2 — S2 - Si, we can have Ti — (apf 
+...+ajv+Qi-i),---5 ( a JV+fcM+"-+aA r +fcM+Qi-i) and T 2 = (aAT + g 1 +...+a/v+Q 2 -i)v> 
(ajV+feM+Qi +---+ajv+feM+Q 2 -i)- 

In a similar fashion, we can construct subsequence streams until such 

that Ti = (ajV + Q i _ 1 +...+aAT+Q i _ 1 ), ...^djV+fcM+Qi.! + ••• + CbN+kM+Qj^)- Let 

<3i=l ,Qj=Qi+l, i.e., each window contains one more element than previous 
windows, and Qj < M, we can simplify T\ to Tj as: Ti~apf,aN + M,---,a.N+kM, 
T2=aN+i,aN+M+i,---, a N+kM+i,---,TM=aN+M-i, a N+2M-i,---, a N+(k+i)M-i - Com- 
bine Ti to T M in an interleave manner, we can obtain stream a A r,a A r +1 ,...,ajv+(fc+i)M-i> 
which is the original stream except for first (N-l) tuples. ■ 

3.5 Checking for Empty or Partial Result Set (NR/PR) 

As mentioned earlier, NR and PR warnings occur while the framework merge 
two query graphs. Since all query graphs in our context are built with filter, map 
and window based aggregation operators, we simplify this problem by checking 
how merging individual operators causes NR/PR cases. Let us first present the 
detailed definitions of NR and PR and an example of NR/PR cases. 

Partial Result Warning(PR): A system issued warning, stating that some 
tuples in the requested stream may not be returned to the user due to conflict 
between the user query and some policies enforced on the streams. 

Empty Result Warning(NR): A system issued warning, stating that none of 
the tuples in the request stream will be returned to the user due to conflict 
between the user query and some policies enforced on the streams. This must 
be differed from the case where the user does not have access to the stream. 

Example 3. Suppose we have a stream S with single attribute a and filter con- 
dition Fl: a > 8 from policy obligation, filter condition F2: a > 5 is from user 
query. Let a part of S be (..., 9,10,11,3,2,6,9,8,7,2,13,...), the user query expects 
output to be like, (...,9,10,11,6,9,8,7,13,...). However, due to Fl, tuples like 6,8,7 
are filtered out and the actual stream the user will get is (... 9,10,11,9,13,...). In 
this case, a PR warning will be issued to the user stating that there are possi- 
bilities that some tuples that fit his requirement are not returned to him due to 
certain access control policies enforced on the data stream. If we change Fl to 
be a < 4, only 3,2,2 will be retained after applying Fl on S. Clearly, any real 
number less than 4 is never greater than 5, so none of the tuples will make a > 5 
true. In another word, predicate "a < 4 AND a > 5" will always be false no 



matter what value a may take. Therefore, none of the tuples will be returned to 
the user and a NR warning will be issued. 

Now let us look at how to generate NR/PR warning for each operator: 
Map Operator: Suppose we have two map operators: Mi from policy and 
Mi from user query with attribute sets S\ and Si. If S\C\ Si — 0, alert NR. 
Otherwise, alert PR if Sx ^ Si- 
Aggregate Operator: Suppose we have two aggregation operators A\ and 
A2, where A\ comes from the policy and Ai comes from user query. We ap- 
ply following rules: (l)If A\.size > Ai.size, alert NR;(2)Ii A\. advance step > 
A2.advancestep, alert NR;(3)li A\.type 7^ A2.type, alert 7Vi?;(4)If different ag- 
gregation functions are applied for the same attribute in A\ and A2, alert 
-/Vi?;(5)For every attribute a in A2, if a is an attribute in A\ and the aggre- 
gation functions applied to a in both A\ and A2 are the same, do not alert; (6)In 
all other cases, alert Pi?. 

Filter Operator: Checking if merging two filter operators gives rise to NR 
or PR warnings is more complicated. We first define two terms: (l)a simple 
expression S is an expression of the form "2 op u" , where x is a variable(in our 
case, an attribute name of a stream schema), op G {<,>,>,<, =,7^}, and v is a 
number, or a string (only when op is = or 7^). (2)A complex expression C is an 
logical predicate that is formed by connecting simple expressions with NOT, OR 
or AND. In our case, C is the filter condition of a filter operator that belongs to 
a policy or a user query. 

Suppose we have two filter operators F\ and F2 with condition C\ and C2 
respectively. Following procedure is used to do the check: 



x op V 


x op' V 


X > V 


X < V 


X < V 


X > V 


X > V 


X < V 


X < V 


X > V 


X = V 


x 7^ v 


x 7^ v 


X = V 



Table 2. Rules to Convert NOT (x op v ) to x op v 

Step 1: Let P = d AND C 2 . Eliminate NOT operator in P using De Mor- 
gan's laws and rules listed in table O Let the result expression be Pi . 

Step 2: Convert Pi into its disjunctive normal form (DNF) P2. Note each 
variable S in P2 is a simple expression. The conversion is done by first change 
Pi to its postfix form and then evaluate the postfix expression, the algorithms 
of which are standard stack based algorithms and can be found easily in places 
such as in [8]. While doing the postfix evaluation, if the operator is AND, apply 
distribution law on the two operands. If the operator is OR , concatenate two 
operands with OR. 

Step 3: Check for NR and PR in P2, by pair- wisely calling function check- 
TwoSimpleExpression on each two S within the same conjunctive expression. 



If any function call returns with NR or PR, mark the whole conjunctive ex- 
pression with NR or PR. If all conjunctive expressions are marked with PR or 
NR, alert PR or NR, respectively. The cost of the whole procedure is bound 
by 0(fcn 2 ), where k is the number of conjunctive expressions in P2 and n is the 
maximum number of S in a single conjunctive expression. 

The function checkTwoSimpleExpression takes as inputs two simple expres- 
sions Si and S2 ■ Apparently, checking is only necessary when Si .x = S% .x. Since 
there are six possible values of op £ {<,>,<,>, =,7^}, we need to do 6 2 com- 
parisons to include all cases that op may take in Si and 5*2. For each comparison, 
there are three cases, i.e., Si.v > S^.v, Si.v < S 2 .v and Si.v = S^.v. We show 
here an example of how to generate NR and PR alerts for one comparison. Let 
Si = x > vi and S2 = x < V2, Figure [5] shows how the warnings are produced 
given different values of v\ and «2 and Example 0] better illustrates how the 
whole procedure works. 




If vl > v2, Alert NR If vl = v2, Alert PR If vl < v2, Alert PR 

Fig. 5. Checking PR/NR for Si = x > vi and Si = x < v 2 



Example 4. Suppose we have Ci =(a>20 AND a<30) OR NOT(a^40), C 2 = 
NOT(a>10) AND b=20. 

First, we eliminate NOT in P = Ci AND C 2 as described in Step 1. After 
elimination, we have Pi = (a>20 AND a<30) OR a=40 AND a<10 AND b=20. 
To make it easier to read, we use following substitutions: Let A be a > 20, B 
be a < 30, C be a = 40, D be a < 10, E be b = 20, & be AND, || be OR. Pi is 
then changed to ((A&B)||C)&(D&E) 

Then we convert Pi to its postfix form, which is A B & C || D E & &, and 
evaluate it as described in Step 2. After evaluation, we will have P2— E & D & 
C || E & D & B & A, which is in CNF and has identical truth table as Pi. 

In Step 3, we pair-wisely apply function checkTwoSimpleExpression on simple 
expressions of the two conjunctive expressions in P2, which are e\ ~ E & D 
& C and e 2 =E&D&B&A. For ei, Cf = 3 calls are made on simple 
expression pairs (E,D), (E,C) and (D,C). For e%, C\ = 6 calls are made on 
(E,D),(E,B),(E,A),(D,B),(D,A) and (B,A). Among all these function calls, (D,C) 
returns NR as ajlO and a=40 cannot be true for any given a value. Similarly, 
function calls on (D,A) also returns NR as a<10 and a>20 contradicts. Both 
ei and e 2 cannot be true for any a value, means that ei||e2, whose truth table 
is identical to Pi and P 2 , cannot be true for any a value. In this NR 
warning will be returned to the user, stating that some predicates in his query 



contradicts with some predicates in the police and no result will be produced 
even if he has been granted access to the target stream. 

4 Prototype and Evaluation 

4.1 Prototype Implementation 

We have implemented a prototype of eXACML+ in Java. We use APIs pro- 
vided by StreamBase to manage and query data streams. Communications be- 
tween clients, proxies and servers are socket-based. For XACML+, we extended 
sun's XACML implementation [5] , which is an open source Java project for the 
XACML standard. The prototype supports access control over data streams by 
dynamic managing query graphs generated from user query and policies. 

4.2 System Evaluation 

The performance of the system is measured by the time taken to fulfil user's 
requests on data streams We compare the results with that of a system that 
query directly to StreamBase DSMS, which is refer to as direct- query system. 

Hardware Setup: The prototype system is deployed in a cloud-like environ- 
ment. We make use of four machines in the experiments, which run data server, 
StreamBase, proxy and client interface accordingly. Machines running server and 
StreamBase are both IBM x3650 servers located in a server room and each has 
two Quad-Core Intel Xeon E5450 processors and 32GB memory. The machine 
running proxy is a mini work station and the client machine is one author's 
macbook. All machines are connected via University's 100Mbps Intranet. The 
StreamBase DSMS maintains a few real-time data streams from various projects, 
such as weather data feeds from a number of mini weather stations producing 
weather records at one minute interval. There are also GPS track information 
from personal mobile devices. 

Workloads: Workloads are formed by sequences of continuous queries. Each 
continuous query corresponds to three files in the experiment: (l)a StreamSQL 
script as the input to direct-query system; (2)a XACML policy file whose obli- 
gations forms the query graph exactly as that in the above StreamSQL script. 
This file will be loaded into eXACML-l- to provide access control policies to 
PDP;(3)a XACML request file for requesting data streams from eXACML+, 
which may also have user query embedded inside. The request file contains cre- 
dentials, resources and actions as specified in the corresponding policy so that 
PDP will always permit the request so that PEP can generate query graphs 
from obligations and user queries. The actual specifications of each query graph 
are generated randomly, but we make sure that parameter names are consis- 
tent with those in stream schemas of so that every query graph generated from 
PEP is valid. In this experiment, we form query graphs with a pre-defined set of 
combinations for Filter(F5), Map(MB) and Aggregation (AB) operators. The 
sequence of continuous queries follow two set up: (1) Each continuous query and 



corresponding request appears only once (i.e. is unique) in the sequence; (2)The 
sequence follows Zipf distribution, which models the scenario where a small num- 
ber of popular streams are requested frequently. Such request pattern is popular 
in P2P file-sharings and web caching [10111) and we use it to verify the perfor- 
mance improvement brought by cache mechanism on query graphs in proxy. The 
parameters used to generate the workloads are illustrated in Table [3] . 



Variable 


Value 


Description 


nDirectQueries 


1500 


number of direct queries 


directQuery Dist 


160:170:130:124:254:290:372 


query graph composition (Single FB : Single MB 
: Single AB : FB+MB : FB+AB : MB+AB : 
FB+MB+AB) 


nPolicies 


1000 


number of unique policies 


nRequests 


1500 


number of matching requests 


a 


0.223 


skew parameter for Zipf distribution 


maxRank 


300 


maximum rank of unique requests from which 
Zipf distribution is generated 



Table 3. Summary of parameters used experiments 
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Fig. 6. Overall Performance 



Experiments: We measure system performance in terms of time taken to 
fulfil authorized data stream requests. Apparently, access control frameworks 
incur additional overhead when handling request from users. However, as we 
will show in the results, the overhead of the framework is consistent most of 
the time and cache mechanism implemented in proxy does help improve the 
overall performance. Before any user request is made, we need to load policies 
onto the data servers so that PDP can make decisions based on them. Loading a 
policy onto server takes a small amount of time without respect to the number 
of policies already loaded. The average loading time is 0.25 second with standard 
deviation of 0.06 second. 
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Fig. 7. Detailed Processing Time of AC Requests 



Figure 6(a) presents the result obtained by running the unique query/request 
sequence. Different from eXACML, there are no actual data transferred in the 
system and most direct queries and most requests to eXACML+ can get re- 
sponse very quickly,i.e.,less than one second. The response time for direct query 
is consistent most of the time, which is reasonable as only the query graphs, in 
forms of StreamSQL scripts, and data stream handles, in the form of URIs, are 
exchanged between the direct query system and DSMS. Given a constant envi- 
ronment in terms of network condition, computational power, etc., the response 
time should not vary much. eXACML+ incurs overheads and the curve is less 
smooth compared to that of direct query system. We believe it is mainly because 
of additional network traffic among clients, proxies and servers, which occupies 
about two thirds of the total response time. The cost for communication between 
multiple entities is also less predictive and subject to charge with large variance. 
Figure 7(a) and Figure [7(b)] represent the detailed elapsed time for processing 
100 and 1500 AC requests with eXACML+, respectively. We can see that time 
taken to make access control decisions and to manipulate query graphs take less 
than 0.01 second in all requests and are rather consistent despite the increment 
number of requests and loaded policies, which are 50 for 7(a) and 1000 for |7(b)| 
Response time of sending query graphs to DSMS occupies one third of the total 
response time on average and has much larger variance. We notice that there 
are a few cases where sending query graphs to DSMS take much longer time 
than average and these cases take place only in the beginning of the request 
sequences. We believe this is due to the behaviour of StreamBase API, which 
needs longer time to establish initial connections to StreamBase than to send 
subsequence queries. In general, the response time for eXACML+ to process AC 
requests is consistent for over 99% of the requests, which verifies that the whole 
system is scalable with respect to number of requests processed and number of 
polices loaded 



Figure [6 (b) I shows how the overhead changes when the proxy enables caching. 
Although eXACML+ does not outperform direct query systems, the performance 
improvement brought by caching is substantial. Unlike eXACML, what cached 
in the proxy is not actual data, but data stream handles, whose sizes are sig- 
nificantly smaller. The performance improvement thus is not as obvious as in 
[3]. Nevertheless, caching leads to over 100% improvement over non-cached re- 
quests for nearly 40% of the number of quests and at least 10% improvement for 
the rest requests. The results justify the importance to have cache mechanism 
implemented in proxy when the request distribution is heavy-tailed. 

5 Related Work 

There are a couple of cloud-based systems that aim to provide data sharing 
capabilities across the Internet. Dropbox[T2] and iCloud[T3] are examples of 
commercial products that enables file sharing among individual devices bases 
on their cloud storage back-end. SenseWeb|14j and SensorBaseflB] . on the other 
hand, allows users to upload the share their sensor data. Theses systems support 
coarse-grained access control model in which an user either makes his data public, 
shares with a list of people or just keeps it private. They cannot deal with the 
access control scenarios we considered in this paper and in [¥]. 

In recently years, Time-series data, or stream data management systems 
have been developed substantially. So far the most renowned commercial DSMS, 
StreamBase [2], was evolved from famous Aurora system^] and its distributed 
version borealis[T^. Enforce access control on data streams is still a fresh topic 
in the research community. Carminati et al. |17) , |18) proposed model and frame- 
work for enforcing access control over stream data. Rimma, et al. [H], [20] pro- 
posed to make use of embedded punctuations in the data stream to enforce access 
control, while Lindner, et al.[5T], [22] proposed to build an additional static layer 
on top of query engines. All these approaches are built on coarse-grained access 
control models and lack of capability to deal with the scenario where fine-grained 
policies are needed. 

Using of XACMLpQ in Cloud systems is yet to receive further development. 
[23] uses XACML in grid environment where its role is to unify database access 
control mechanisms from multiple parties. We foresee that XACML will be more 
widely used in the industry as it's the current OASIS standard and is continu- 
ously evolving in response to new access control requirements. To the best of our 
knowledge, our work in this paper is the first to use XACML model to enforce 
fine-grained access control policies over stream data. 

6 Conclusion and Future Work 

In this paper, we have proposed eXACML+ facilitating data stream owners 
to share their data streams with other users in a secure and flexible manner 
over a trusted cloud infrastructure. The main challenges were owing to the vital 
differences between bounded data (as in RDBMS) and unbounded stream data 



(DSMS). We also explore the problem of possible privacy leak by allowing a single 
user to have access to multiple aggregated streams of one master data stream. 
We also show in the paper how to effectively detect if a user query will return 
empty or only partial result due to mismatches with access control policies. We 
have implemented the prototype eXACML+ system over the Aurora data-model 
based StreamBase data stream management system. Preliminary experiment 
results show our framework's efficacy. It incurs constant and consistent overhead 
compared with direct queries (without access control) on the data stream engine. 

Our immediate plans are to migrate the framework to commercial Cloud 
environments such as Amazon EC2 [53] and Microsoft's Azure [25] for more 
comprehensive evaluations with practical workloads instead of synthetic ones. 
Moreover, we intend to use other stream base engine like APE and DB2 
DSE |27] to broaden the range of applications that our framework supports. 
On the conceptual front, relaxing the trusted cloud model to incorporate more 
accountability mechanisms is our primary next challenge. 
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